Interactive Plotting With Bokeh

Most of the time a static visualization will effectively communicate the message you wish to convey but sometimes there's a need to add interactivity to take a visualization to the next level, enter Bokeh.

"Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, and to extend this capability with high-performance interactivity over very large or streaming datasets."

Here's the User Guide

In this post we'll use Bokeh to create bar plots, stacked bar plots, box plots and histograms and get a feel for the library, in a later post we'll create more complex visualizations.

Additional Resources: 1. Texas Geo examples 2. High-level examples 3. Bokeh toy datasets 4. Data Source

bokeh_logo

Imports & Data Preparation

import pandas as pd

#read only necesasary cols
df = pd.read_csv("HD_US.csv", usecols=["LocationAbbr",
                                       "Data_Value",
                                       "Stratification1",
                                       "Stratification2"])
#rename & drop nans
df = df.rename(columns={'LocationAbbr': 'State',
                        'Data_Value': 'HD Rate (per 100,000)',
                        'Stratification1':'Gender',
                        'Stratification2':'Race' }).dropna(axis=0, how = 'any')

df.head() #overall = aggregated for that region

#mean hd by race
mean_hd = df.groupby('Race').mean().round(2).reset_index()
mean_hd = mean_hd.rename(columns={'HD Rate (per 100,000)':'Mean Heart Disease Death (per 100,000)'})
mean_hd
Race Mean Heart Disease Death (per 100,000)
0 American Indian and Alaskan Native 419.60
1 Asian and Pacific Islander 180.54
2 Black 432.55
3 Hispanic 207.37
4 Overall 362.06
5 White 366.18

## Barchart

from bokeh.charts import Bar, output_file, show,output_notebook

tooltips=[
    ('Race', '@Race'),('Value','$y')]

p = Bar(mean_hd, 'Race', values='Mean Heart Disease Death (per 100,000)',
        title="Avg Heart Disease Mortality Rate (per 100,000) by Race (US)",
        tooltips=tooltips, color='Race', legend=False, plot_width=750)

output_notebook()
show(p)
<div class="bk-root">
    <a href="http://bokeh.pydata.org" target="_blank" class="bk-logo bk-logo-small bk-logo-notebook"></a>
    <span id="206caeb5-650e-444a-8704-289d929a611c">Loading BokehJS ...</span>
</div>







<div class="bk-root">
    <div class="bk-plotdiv" id="c89cd7e3-86f4-44b9-ac01-bb34d0ee0411"></div>
</div>

Stacked Bar Plot

from bokeh.charts import Bar, show, output_notebook

p = Bar(df, label='Gender', values='HD Rate (per 100,000)', agg='mean', stack='Race',
        title="Avg Heart Disease Mortality Rate (per 100,000) by Race & Gender (US)",
        legend=False, tooltips=tooltips)

output_notebook()
show(p)
<div class="bk-root">
    <a href="http://bokeh.pydata.org" target="_blank" class="bk-logo bk-logo-small bk-logo-notebook"></a>
    <span id="60a4ea2d-785c-4ee3-a299-f11af6a9df5b">Loading BokehJS ...</span>
</div>







<div class="bk-root">
    <div class="bk-plotdiv" id="e174cdf9-7f1e-493c-ac09-1094d2fe1258"></div>
</div>

Box Plot

from bokeh.charts import BoxPlot, output_notebook, show

df = df[df['HD Rate (per 100,000)'] < 4000]  #deal with outliers

p = BoxPlot(df, values='HD Rate (per 100,000)', label='Race',
            title="Avg Heart Disease Mortality Rate (per 100,000) by Race (US)",
            color = 'Race',legend=False, tooltips=tooltips)

output_notebook()
show(p)
<div class="bk-root">
    <a href="http://bokeh.pydata.org" target="_blank" class="bk-logo bk-logo-small bk-logo-notebook"></a>
    <span id="80ddf11a-3f1d-43a1-84ac-59ff72c6524a">Loading BokehJS ...</span>
</div>







<div class="bk-root">
    <div class="bk-plotdiv" id="649a1dbc-629d-42b3-bb5a-90bfbca51d22"></div>
</div>

Box Plot without Outliers

While outliers are nice to display, they sometimes add too much noise to a visualization.

from bokeh.charts import BoxPlot, output_notebook, show

df = df[df['HD Rate (per 100,000)']< 4000]  #deal with outliers

p = BoxPlot(df, values='HD Rate (per 100,000)', label='Race',
            title="Avg Heart Disease Mortality Rate (per 100,000) by Race (US)",
            tooltips=tooltips, color = 'Race', outliers=False,
            legend=False)

output_notebook()
show(p)
<div class="bk-root">
    <a href="http://bokeh.pydata.org" target="_blank" class="bk-logo bk-logo-small bk-logo-notebook"></a>
    <span id="701d38ff-edf6-4ba0-a642-2877fd7705e6">Loading BokehJS ...</span>
</div>







<div class="bk-root">
    <div class="bk-plotdiv" id="37073edf-27d7-42b0-9ba3-7e960b90501d"></div>
</div>

Grouped Histograms

from bokeh.charts import Histogram, output_notebook, show

p = Histogram(df, values='HD Rate (per 100,000)', color='Race',
              title="Avg Heart Disease Mortality Rate (per 100,000) by Race (US)",
              legend='top_right')

output_notebook()
show(p)
<div class="bk-root">
    <a href="http://bokeh.pydata.org" target="_blank" class="bk-logo bk-logo-small bk-logo-notebook"></a>
    <span id="66172dc7-7663-4831-80e7-e6b63b913f61">Loading BokehJS ...</span>
</div>







<div class="bk-root">
    <div class="bk-plotdiv" id="bd68b8eb-9cd3-4e80-b55c-82a697fc0a94"></div>
</div>